Skip to content

Conversation

@matts1
Copy link
Contributor

@matts1 matts1 commented Feb 17, 2025

This is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU. It can result in hundreds of milliseconds of additional time to also traverse the AST of these modules, and often for no benefit, as they are frequently already traversed in their own TU.

Without this PR, our raw pointer plugin to detect unsafe uses of raw pointers can add up to 1 second of overhead to check a file that only contains the line #include <cstddef>. With this PR, we can use the same logic as in #127161 to reduce this overhead to <10ms, a 100x speedup.

See https://issues.chromium.org/issues/351909443 for details and benchmarks.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

This is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU.
It can result in hundreds of milliseconds of additional time to also traverse the AST of these modules, and often for no benefit, as they are frequently already traversed in their own TU.
@matts1 matts1 marked this pull request as ready for review February 17, 2025 03:57
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Feb 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 17, 2025

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Matt (matts1)

Changes

This is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU. It can result in hundreds of milliseconds of additional time to also traverse the AST of these modules, and often for no benefit, as they are frequently already traversed in their own TU.

Without this PR, our raw pointer plugin to detect unsafe uses of raw pointers can add up to 1 second of overhead to check a file that only contains the line #include &lt;cstddef&gt;. With this PR, we can use the same logic as in #127161 to reduce this overhead to <10ms, a 100x speedup.

See https://issues.chromium.org/issues/351909443 for details and benchmarks.


Full diff: https://github.com/llvm/llvm-project/pull/127423.diff

4 Files Affected:

  • (modified) clang/include/clang/ASTMatchers/ASTMatchFinder.h (+7)
  • (modified) clang/lib/ASTMatchers/ASTMatchFinder.cpp (+2-1)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersTest.h (+17-6)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp (+12)
diff --git a/clang/include/clang/ASTMatchers/ASTMatchFinder.h b/clang/include/clang/ASTMatchers/ASTMatchFinder.h
index a387d9037b7da..2b161a574d5b6 100644
--- a/clang/include/clang/ASTMatchers/ASTMatchFinder.h
+++ b/clang/include/clang/ASTMatchers/ASTMatchFinder.h
@@ -139,6 +139,13 @@ class MatchFinder {
     ///
     /// It prints a report after match.
     std::optional<Profiling> CheckProfiling;
+
+    /// Whether to traverse a Decl. This is relevant for clang modules, as they
+    /// are imported into the AST, but are actually part of a different TU.
+    /// It can result in hundreds of milliseconds of additional time to also
+    /// traverse the AST of these modules, and often for no benefit, as they
+    /// are frequently already traversed in their own TU.
+    std::optional<llvm::function_ref<bool(const Decl &)>> ShouldTraverseDecl;
   };
 
   MatchFinder(MatchFinderOptions Options = MatchFinderOptions());
diff --git a/clang/lib/ASTMatchers/ASTMatchFinder.cpp b/clang/lib/ASTMatchers/ASTMatchFinder.cpp
index 3d01a70395a9b..5d2f2065ceba1 100644
--- a/clang/lib/ASTMatchers/ASTMatchFinder.cpp
+++ b/clang/lib/ASTMatchers/ASTMatchFinder.cpp
@@ -1443,7 +1443,8 @@ bool MatchASTVisitor::objcClassIsDerivedFrom(
 }
 
 bool MatchASTVisitor::TraverseDecl(Decl *DeclNode) {
-  if (!DeclNode) {
+  if (!DeclNode || (Options.ShouldTraverseDecl &&
+                    !(*Options.ShouldTraverseDecl)(*DeclNode))) {
     return true;
   }
 
diff --git a/clang/unittests/ASTMatchers/ASTMatchersTest.h b/clang/unittests/ASTMatchers/ASTMatchersTest.h
index ad2f5f355621c..02bdcc3a3ab1f 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersTest.h
+++ b/clang/unittests/ASTMatchers/ASTMatchersTest.h
@@ -59,6 +59,11 @@ class VerifyMatch : public MatchFinder::MatchCallback {
   const std::unique_ptr<BoundNodesCallback> FindResultReviewer;
 };
 
+inline ArrayRef<TestLanguage> langCxx11() {
+  static const TestLanguage Result[] = {Lang_CXX11};
+  return ArrayRef<TestLanguage>(Result);
+}
+
 inline ArrayRef<TestLanguage> langCxx11OrLater() {
   static const TestLanguage Result[] = {Lang_CXX11, Lang_CXX14, Lang_CXX17,
                                         Lang_CXX20, Lang_CXX23};
@@ -91,9 +96,11 @@ testing::AssertionResult matchesConditionally(
     const Twine &Code, const T &AMatcher, bool ExpectMatch,
     ArrayRef<std::string> CompileArgs,
     const FileContentMappings &VirtualMappedFiles = FileContentMappings(),
-    StringRef Filename = "input.cc") {
+    StringRef Filename = "input.cc",
+    MatchFinder::MatchFinderOptions Options =
+        MatchFinder::MatchFinderOptions()) {
   bool Found = false, DynamicFound = false;
-  MatchFinder Finder;
+  MatchFinder Finder(Options);
   VerifyMatch VerifyFound(nullptr, &Found);
   Finder.addMatcher(AMatcher, &VerifyFound);
   VerifyMatch VerifyDynamicFound(nullptr, &DynamicFound);
@@ -147,11 +154,13 @@ testing::AssertionResult matchesConditionally(
 template <typename T>
 testing::AssertionResult
 matchesConditionally(const Twine &Code, const T &AMatcher, bool ExpectMatch,
-                     ArrayRef<TestLanguage> TestLanguages) {
+                     ArrayRef<TestLanguage> TestLanguages,
+                     MatchFinder::MatchFinderOptions Options =
+                         MatchFinder::MatchFinderOptions()) {
   for (auto Lang : TestLanguages) {
     auto Result = matchesConditionally(
         Code, AMatcher, ExpectMatch, getCommandLineArgsForTesting(Lang),
-        FileContentMappings(), getFilenameForTesting(Lang));
+        FileContentMappings(), getFilenameForTesting(Lang), Options);
     if (!Result)
       return Result;
   }
@@ -162,8 +171,10 @@ matchesConditionally(const Twine &Code, const T &AMatcher, bool ExpectMatch,
 template <typename T>
 testing::AssertionResult
 matches(const Twine &Code, const T &AMatcher,
-        ArrayRef<TestLanguage> TestLanguages = {Lang_CXX11}) {
-  return matchesConditionally(Code, AMatcher, true, TestLanguages);
+        ArrayRef<TestLanguage> TestLanguages = {Lang_CXX11},
+        MatchFinder::MatchFinderOptions Options =
+            MatchFinder::MatchFinderOptions()) {
+  return matchesConditionally(Code, AMatcher, true, TestLanguages, Options);
 }
 
 template <typename T>
diff --git a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
index 068cf66771027..02badc50241d2 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
+++ b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
@@ -28,6 +28,18 @@ TEST(DeclarationMatcher, hasMethod) {
                          cxxRecordDecl(hasMethod(isPublic()))));
 }
 
+TEST(DeclarationMatcher, shouldTraverse) {
+  MatchFinder::MatchFinderOptions Options;
+  Options.ShouldTraverseDecl = [](const Decl &decl) { return true; };
+  EXPECT_TRUE(matches("class A { void func(); };",
+                      cxxRecordDecl(hasMethod(hasName("func"))), langCxx11(),
+                      Options));
+  Options.ShouldTraverseDecl = [](const Decl &decl) { return false; };
+  EXPECT_FALSE(matches("class A { void func(); };",
+                       cxxRecordDecl(hasMethod(hasName("func"))), langCxx11(),
+                       Options));
+}
+
 TEST(DeclarationMatcher, ClassDerivedFromDependentTemplateSpecialization) {
   EXPECT_TRUE(matches(
     "template <typename T> struct A {"

@zmodem
Copy link
Collaborator

zmodem commented Feb 20, 2025

This is relevant for clang modules, as they are imported into the AST, but are actually part of a different TU.

I'm not a modules or matcher expert, but this doesn't match my (perhaps uninformed) understanding. I thought that when we import a module, it is part of our TU, similarly to when we include a header.

I thought what we wanted to do was avoid walking the parts of the AST that come from a module but aren't used in the TU, and therefore haven't been deserialized.

@matts1
Copy link
Contributor Author

matts1 commented Feb 20, 2025

I'm not a modules or matcher expert, but this doesn't match my (perhaps uninformed) understanding. I thought that when we import a module, it is part of our TU, similarly to when we include a header.

You are correct. I phrased it very poorly. I'll reclarify - if we have a header file we depend on, then if the header file came from a different module, it would have already been checked when we compiled that module, so there's no need to recheck it.

I thought what we wanted to do was avoid walking the parts of the AST that come from a module but aren't used in the TU, and therefore haven't been deserialized.

Consider:

// module.cpp
#include "module.h"
// lib.cpp
#include "lib.h"
#include "module.h"

To compile lib, we perform something along the lines of:

clang++ -emit-module -o module.pcm -add-plugin raw-ptr-plugin module.cpp
clang++ -emit-module -o lib.pcm -add-plugin raw-ptr-plugin lib.cpp -fmodule-file module.pcm

The raw plugin will check module.cpp and module.h in the first action. Thus, there is no need to bother checking module.h when compiling lib, because it is in a different module. We could only check parts of module.h that we depend on, but that still seems to be redundant.

@Bigcheese
Copy link
Contributor

Is there a particular reason to have this be a generic predicate rather than just a bool of if other modules should be visited? I can see how you could do other things with it like skipping a namespace or something, but I'm not sure if the other use cases would actually see use.

If there is a reason for a generic predicate, then I think this implementation is fine.

@Bigcheese Bigcheese self-requested a review February 27, 2025 18:36
@matts1
Copy link
Contributor Author

matts1 commented Feb 27, 2025

I think other use cases could see use. For example, you could write a predicate to check whether it's in the std namespace to avoid checking the standard library, or a predicate to check whether a class is a subclass of proto.Message to skip generated protobuf code.

I don't personally need the generic predicate though, so if you'd prefer I can do a bool.

Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good from the modules end and I get how the predicate can be useful. My only concern is that this feels a bit like another matcher that's not using the matcher API, so I'd like someone more familiar with it to also approve, maybe @5chmidti or @HighCommander4.

@HighCommander4
Copy link
Collaborator

We've encountered a similar use case in clangd -- for fast incremental reparses, we split the file into a "preamble" which we build as a PCH, and a "main file" which consumes the PCH, and then we want to run e.g. clang-tidy checkers only on the "main file" part of the AST -- and we address it by using ASTContext::setTraveralScope() as seen here.

Perhaps this approach could be used for modules as well?

@atetubou
Copy link
Contributor

What is the current status of this PR? @matts1 is it possible to apply the idea from @HighCommander4 ?

@matts1
Copy link
Contributor Author

matts1 commented May 26, 2025

I've currently put it on hold since I'm prioritising allowing config mismatches

@cor3ntin cor3ntin changed the title [Feat] Allow Finding across only parts of an AST. [Clang] Allow Finding across only parts of an AST. May 26, 2025
@cor3ntin cor3ntin added the clang:modules C++20 modules and Clang Header Modules label May 26, 2025
@matts1
Copy link
Contributor Author

matts1 commented Jun 13, 2025

Can confirm that this approach appears to work.

@matts1 matts1 closed this Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants